1 research outputs found
RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models
Large language model (LLM) applications in cloud root cause analysis (RCA)
have been actively explored recently. However, current methods are still
reliant on manual workflow settings and do not unleash LLMs' decision-making
and environment interaction capabilities. We present RCAgent, a tool-augmented
LLM autonomous agent framework for practical and privacy-aware industrial RCA
usage. Running on an internally deployed model rather than GPT families,
RCAgent is capable of free-form data collection and comprehensive analysis with
tools. Our framework combines a variety of enhancements, including a unique
Self-Consistency for action trajectories, and a suite of methods for context
management, stabilization, and importing domain knowledge. Our experiments show
RCAgent's evident and consistent superiority over ReAct across all aspects of
RCA -- predicting root causes, solutions, evidence, and responsibilities -- and
tasks covered or uncovered by current rules, as validated by both automated
metrics and human evaluations. Furthermore, RCAgent has already been integrated
into the diagnosis and issue discovery workflow of the Real-time Compute
Platform for Apache Flink of Alibaba Cloud